This section will provide a high level summary of the project and the work the team did to reach the project's end goal.
SAP is developing a new product called Consumer Insight 365 (CI365) to enable businesses to better manage and expand their markets. Mobile carriers have an enormous amount of unused consumer cellphone usage data. Mobile carriers can monetize on this data as well as gain insight into their customer base. CI365 is a tool to put this data to work by analyzing:
Focusing on a small carrier's mobile user data, determine correlations between texting / calling habits, URL categories and geo-location with user gender / age. SAP is interested in having the ability to determine the gender and general age of the mobile user based on his/her phone habits.
Due data integrity issues, schedule constraints, and incomplete data the team received, none of the models the team developed could achieve an accuracy higher than 62%. It was determined that CHAID was more accurate when more parameters and numbers values need to be evaluated; however, Bayes was more accurate with the binary and simplistic inputs in the training set. The algorithms predicted females more than males, but that lead to male predictions being more accurate. Instead of the algorithms having accuracy results based on the demographics of the learning training sets, it turned out the accuracy results were impacted more on the different testing sets. Grouping of the data by age and gender in the training set has little impact on the learning and application of the algorithms on the testing sets.